Scalable Parallel Topic Models

نویسندگان

  • David Newman
  • Padhraic Smyth
  • Mark Steyvers
چکیده

U) The topic model is a popular probabilistic model for text and document modeling. It can be used for topic indexing, document classification, corpus summarization and information retrieval. In the past, topic models have been applied to corpora containing thousands to hundreds of thousands of documents. Now there is an increasing need to model collections with millions to billions of documents. We present a parallel algorithm for the topic model that has linear speedup and high parallel efficiency for shared-memory symmetric multiprocessors (SMPs). Using this parallel algorithm, topic model computations on an 8-processor system took 1/7 the time of the same computation on a single processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Inference for Logistic-Normal Topic Models

Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially c...

متن کامل

Models, Inference, and Implementation for Scalable Probabilistic Models of Text

Title of dissertation: Models, Inference, and Implementation for Scalable Probabilistic Models of Text Ke Zhai, Ph.D., 2014 Dept. of Computer Science Dissertation directed by: Professor Jordan Boyd-Graber iSchool, UMIACS Unsupervised probabilistic Bayesian models are powerful tools for statistical analysis, especially in the area of information retrieval, document analysis and text processing. ...

متن کامل

Scalable Parellel Octree Using HPX With Hilbert Curve

and D. Fey, “Hpx – a task based programming model in a global address space,” PGAS 2014: The 8th International Conference on Partitioned Global Address Space Programming Models, 2014. [2] Zahra Khatami, Hartmut Kaiser, Patricia Grubel, Bryce Adelstein-Lelbach, Adrian Serio and J. Ramanujam, “A massively parallel distributed N-Body simulation code implemented with HPX”, 28th ACM Symposium on Par...

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007